A Hierarchical Document Description and Comparison Method

نویسندگان

  • Burak Bitlis
  • Xiaojun Feng
  • Jacob L. Harris
  • Ilya Pollak
  • Charles A. Bouman
  • Mary P. Harper
  • Jan P. Allebach
چکیده

Determining the similarity of document images is an important first step for several document retrieval tasks, such as document classification, information extraction, and retrieval based on visual similarity. In this paper, we propose a method to describe and compare the content and layout of a document given only an image of the document. A tree structure is used to capture the hierarchical structure of the document. Two documents are then compared using a tree matching strategy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

الگوی ملزومات کارکردی پیشینه‌های کتابشناختی: شیوه‌ای نوین در تنظیم عناصر کتابشناختی

Functional Requirements for Bibliographic Records (FRBR) is a conceptual model for the arrangement of bibliographic records in catalogs and databases which was proposed in IFLA 1997, following a plan for revising Anglo-American Cataloging Rules (AACR). This model is inclined to be separated from the other cataloging rules, and uses a new structure for storing and displaying bibliographic record...

متن کامل

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

Document Comparison with a Weighted Topic Hierarchy

A method of document comparison based on a hierarchical dictionary of topics (concepts) is described. The hierarchical links in the dictionary are supplied with the weights that are used for detecting the main topics of a document and for determining the similarity between two documents. The method allows for the comparison of documents that do not share any words literally but do share concept...

متن کامل

Hierarchical Fuzzy Clustering Semantics (HFCS) in Web Document for Discovering Latent Semantics

This paper discusses about the future of the World Wide Web development, called Semantic Web. Undoubtedly, Web service is one of the most important services on the Internet, which has had the greatest impact on the generalization of the Internet in human societies. Internet penetration has been an effective factor in growth of the volume of information on the Web. The massive growth of informat...

متن کامل

Comparison of Error Tree Analysis and TRIPOD BETA in Accident Analysis of a Power Plant Industry Using Hierarchical Analysis

Introduction: Due to the importance and necessity of accident analysis, it is necessary to use proper technique for precise accident analysis and to provide corrective and preventive measures to prevent recurrence of an accident. Method: In this descriptive-analytical paper, the most important criteria for investigating and selecting accident investigation and analysis techniques and selecting...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003